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Faster Worst Case Deterministic Dynamic Connectivity* * 
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Abstract 


We present a deterministic dynamic connectivity data structure for undirected graphs with 



constant query time. This improves on the 


worst 


previous best deterministic worst case algorithm of Frederickson ( STOC , 1983) and Eppstein 
Galil, Italiano, and Nissenzweig (</. ACM , 1997), which had update time 0(y/n). All other 
algorithms for dynamic connectivity are either randomized (Monte Carlo) or have only amortized 
performance guarantees. 

1 Introduction 

Dynamic Connectivity is perhaps the single most fundamental unsolved problem in the area of 
dynamic graph algorithms. The problem is simply to maintain a dynamic undirected graph G = 
(V, E) subject to edge updates and connectivity queries: 

Insert(u, v) : Set E «- E U {(u,u)}. 

Delete(u, v) : Set E <— E \ {(n,v)}. 

Conn ?(u,v) : Determine whether u and v are in the same connected component in G. 

Over thirty year ago Frederickson m introduced topology trees and 2-dimensional topology 
trees, which gave the first non-trivial solution to the problem. Each edge insertion/deletion is 
handled in 0(y/m) time and each query is handled in 0(1) time. Here m is the current number 
of edges and n the number of vertices. On sparse graphs (where m = O(n)) Frederickson’s data 
structure has seen no unqualified improvements or simplifications. However, when the graph is 
dense Frederickson’s data structure can be sped up using the general sparsification method of 
Eppstein, Galil, Italiano, and Nissenzweig [6]. Using simple sparsification [7] the update time 
becomes 0(y/n log (m/n)) and using more sophisticated sparsification [6] the running time becomes 
0(y/n). This last bound has not been improved in twenty years. 

Most research on the dynamic connectivity problem has settled for amortized update time 
guarantees. Following m Il6j . Holm et al. HD gave a very simple deterministic algorithm with 
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amortized update time 0(log 2 n) and query time 0(log n/ log log n)@ However, in the worst case 
Holm et al.’s m update takes tt(m) time, the same for computing a spanning tree from scratch! 
Recently Wulff-Nilsen [25] improved the update time of m to 0(log 2 n/ log log n). Using Las Vegas 
randomization, Thorup [24] gave a dynamic connectivity structure with an 0(logn(loglogn) 3 ) 
amortized update time. In other words, the algorithm answers all connectivity queries correctly 
but the amortized update time holds with high probability. 

In a major breakthrough Kapron, King, and Mountjoy [18] used Monte Carlo randomization to 
achieve a worst case update time of 0(log° n). However, this algorithm has three notable drawbacks. 
The first is that it is susceptible to undetected false negatives: CONN?(rt, v) may report that u,v 
are disconnected when they are, in fact, connected. The second is that even when Conn ?(u,v) 
(correctly) reports that u , v are connected, it is forbidden from exhibiting a connectivity witness, 
i.e., a spanning forest in which u,v are joined by a path. The Kapron et al. fl8j algorithm does 
maintain such a spanning forest internally, but if this witness were made public, a very simple 
attack could force the algorithm to answer connectivity queries incorrectly. Lastly, the algorithm 
uses fl(nlog 2 n) space, which for sparse graphs is superlinear in m. Very recently Gibb et al. |13] 
reduced the update time of m to 0( log 4 n). 

On special graph classes, dynamic connectivity can often be handled more efficiently. For 
example, Sleator and Tarjan [22] maintain a dynamic set of trees in O(logn) worst-case update 
time subject to O(logn) time connectivity queries. (See also [33], [3j [1, [23].) Connectivity in 
dynamic planar graphs can be reduced to the dynamic tree problem M, and therefore solved in 
O(logn) time per operation. The cell probe lower bounds of Patra§cu and Demaine [20] show that 
Sleator and Tarjan’s bounds are optimal in the sense that some operation must take fl(logn) time. 
Superlogarithmic updates can be used to get modestly sublogarithmic queries, but Patra§cu and 
Thorup [21] prove the reverse is not possible. In particular, any dynamic connectivity algorithm 
with o(logn) update time has n 1— query time. Refer to Table [1] in the appendix for a history of 
upper and lower bounds for dynamic connectivity. 

1.1 New Results 

In this paper we return to the classical model of deterministic worst case complexity. We give a 
new dynamic connectivity structure with worst case update time on the order of 

m log 5 w 
w J ’ 

where w = H(logn) is the word sizeJl These are the first improvements to Frederickson’s 2D- 
topology trees [10] in over 30 years. Using the sparsification reduction of Eppstein et al. [6j the run¬ 
ning time expressions can be made to depend on ‘n’ rather than l m\ so we obtain Q(y/lT gM°g ra ) ) 
bounds (or faster) for all graph densities. 

Compared to the amortized algorithms mmm, ours is better suited to online applications 

1 Any connectivity structure that maintains (internally) a spanning forest can have query time O(log tu/ , logn n) if 
the update time is t u = fi(logn). 

2 Our algorithms use the standard repertoire of AC 0 operations: left and right shifts, bitwise operations on words, 

additions and comparisons. They do not assume unit-time multiplication. 
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that demand a bound on the latency of every operationJl Compared to the Monte Carlo algo¬ 
rithms (18l Fl3 j. ours is attractive in applications that demand linear space, zero probability of 
error, and a public witness of connectivity. 

The modest speedup obtained by our algorithm over 110 comes from word-level parallelism, 
which is a widely used in both theory and practice. However, just because the underlying machine 
can operate on w = fi(logn) bits at once does not mean that poly(ie)-factor speedups come easily 
or automatically. The true contribution of this work is in reorganizing the representation of the 
graph so that word-level parallelism becomes a viable technique. As a happy byproduct, we develop 
an approach to worst case dynamic connectivity that is conceptually simpler than Frederickson’s 
(2-dimensional) topology trees. 

Organization. In Section [2] we describe our high level approach, without getting into low-level 
data structural details. Section [3] gives a relatively simple instantiation of the high-level approach 
with update time 0(y/n/w 1 ^ 4: ), which is slightly slower than our claimed result. In Section [4] we 
describe the modifications needed to achieve the claimed bounds. 

2 The High Level Algorithm 

The algorithm maintains a spanning tree of each connected component of the graph as a witness 
of connectivity. Each such witness tree T is represented as an Euler tour Euler(T)Q Euler(T) 
is the sequence of vertices encountered in some Euler tour around T, as if each undirected edge 
were replaced by two oriented edges. It has length precisely 2(|V(T)| — 1) if |V(T)| > 2 (the 
last vertex is excluded from the list, which is necessarily the same as the first) or length 1 if 
|V(T)| = 1. Vertices may appear in Euler(T) several times. We designate one copy of each vertex 
the principle copy, which is responsible for all edges incident to the vertex. Each vertex in the 
graph maintains a pointer to its principle copy. Each T-edge (u,v) maintains two pointers to 
the (possibly non-principle) copies of u and v that precede the oriented occurrences of ( u, v ) and 
(v,u) in Euler(T), respectively. Note that cyclic rotations of Euler(T) are also valid Euler tours; if 
Euler(T) = (u,... ,v) the last element of the list is associated with the tree edge ( v ,it). 

When an edge (u, v) that connects distinct witness trees To and T\ is inserted, (it, v ) becomes 
a tree edge and we need to construct Euler(To U {(it, u)} U T\) from Euler(To) and Euler(Ti). In 
the reverse situation, if a tree edge (u,v) is deleted from T = To U {(it, u)} U T\ we first construct 
Euler(To) and Euler(Ti) from Euler(T), then look for a replacement edge, (u,v) with u € V(Tq) 
and v € V(T\). If a replacement is found we construct Euler(To U {(u, h)} U T\ ) from Euler(To) 
and Euler(Ti). Lemma 12.11 establishes the nearly obvious fact that the new Euler tours can be 
obtained from the old Euler tours using 0(1) of the following surgical operations: splitting and 
concatenating lists of vertices, and creating and destroying singleton lists containing non-principle 
copies of vertices. 


3 Amortized data structures are most useful when employed by offline algorithms that do not care about individ¬ 
ual operation times. The canonical example is the use of amortized Fibonacci heaps m to implement Dijkstra’s 
algorithm [5j- 

4 Henzinger and King m were the first to use Euler tours to represent dynamic trees. G. Italiano (personal 
communication) observed that Euler tours could be used in lieu of Frederickson’s topology trees to obtain an 0(yfrn)- 
time dynamic connectivity structure. 
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Lemma 2.1. If T = Tq U {(u, u)} U T\ and (u,v) is deleted, Euler(To) and Euler(Ti) can be 
constructed from Euler(T) with 0(1) surgical operations. In the opposite direction, from Euler(To) 
and Euler (Ti) we can construct Euler(To U {(u,v)} U Ti) with 0(1) surgical operations. It takes 
0 ( 1 ) time to determine which surgical operations to perform. 

Proof. Recall that cyclic shifts of Euler tours are valid Euler tours. Suppose without loss of gen¬ 
erality that Euler(T) = (Pq, u, v, P\, v, u, P 2 ) where Pq, P\ , and P 2 are sequences of vertices. (Note 
that Euler tours never contain immediate repetitions. If P\ is empty then Euler(T) would be 
just (P 0 ,u,v,u,P 2 y, if both Pq and P 2 are empty then Euler(T) = (u, v, Pi, v).) Then we obtain 
Euler(To) = (Pq, u, P 2 ) and Euler(Ti) = (v,P\) with 0(1) surgical operations, which includes the 
destruction of non-principle copies of u and v; at least one of the two copies must be non-principle. 
We could also set Euler (Ti) = (Pi,v), which would be more economical if the v following Pi in 
Euler(T) were the principle copy. 

In the reverse direction, write Euler(To) = (Pq,u,P\) and Euler(Ti) = (P 2 ,v,Ps), where 
the labeled occurrences are the principle copies of u and v. Then Euler(To U {(u, v)} U T\) = 
(Pq, u, v, P 3 , P 2 , v, u, Pi), where the new copies of u and v are clearly non-principle copies. If P 2 
and P 3 were empty (or Pq and P\ were empty) then we would not need to add a non-principle copy 
of v (or a non-principle copy of u.) □ 

Define m to be an upper bound on m, the number of edges. The update time of our data 
structure will be a function of rh. The sparsification method of [ 6 ] creates instances in which rh is 
known to be linear in the number of vertices. 

2.1 A Dynamic List Data Structure 

We have reduced dynamic connectivity in graphs to implementing several simple operations on 
dynamic lists. We will maintain a pair (£,£), where £ is a set of lists (containing principle and 
non-principle copies of vertices) and E is the dynamic set of edges joining principle copies of vertices. 
In addition to the creation and destruction of single element lists we must support the following 
primitive operations. 

List(x) : Return the list in £ containing element x. 

Join(Lo, L 1 ) : Set £ •(— £\{Lo, Li}U {£o£i}> that is, replace Lq and L\ with their concatenation 
LqL\. 

Split(x) : Let L = LqL\ € £, where x is the last element of Lq. Set £ £ \ {£} U {Lq, L{\. 

ReplacementEdge(£ 0 , £ 1 ) : Return any edge joining elements in Lq and L\. 

Our implementations of these operations will only be efficient if, after each Insert or Delete 
operation, there are no edges connecting distinct lists. That is, the ReplacementEdge operation 
is only employed by Delete when deleting a tree edge in order to restore Invariant 12.21 

Invariant 2.2. Each list £ corresponds to the Euler tour of a spanning tree of some connected 
component. 

The dynamic connectivity operations are implemented as follows. To answer a Conn?(m,d) 
query we simply check whether List(r) = List(w). To insert an edge (u,v) we do Insert(u, v), 
and if List (it) List(w) then make (u,v) a tree edge and perform suitable Splits and Joins to 
merge the Euler tours List(u) and List(u). To delete an edge (u,v) we do Delete(u, v), and if 
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(u,v) is a tree edge in T = TqL){(u, u)}UTi, perform suitable Splits and Joins to create Euler(To) 
and Euler(Ti) from Euler(T). At this point Invariant 12.21 mav be violated as there could be an edge 
joining To and T\. We call REPLACEMENTEDGE(Euler(T 0 ), Euler(Ti)) and if it finds an edge, say 
(u,v), we perform more Splits and Joins to form Euler(To U {(u, 0)} U T\). 

Henzinger and King m observed that most off-the-shelf balanced binary search trees can 
support Split, Join, and other operations in logarithmic time. However, they provide no direct 
support for the ReplacementEdge operation, which is critical for the dynamic connectivity 
application. 

3 A New Dynamic Connectivity Structure 

3.1 Chunks and Superchunks 

In order to simplify the maintenance of Invariant EH stated below, we shall assume that the 
maximum degree never exceeds K , where K ~ y/m / poly (in) is a parameter of the algorithm. 
Refer to Appendix [A] for a discussion of clean ways to remove this assumption. 

If L' is a sublist of a list L € C, define mass(L') to be the number of edges incident to elements 
of If , counting an edge twice if both endpoints are in L'. The sum of list masses, X^Le£ mass (-C> 
is clearly at most 2m, where m is the fixed upper bound on the number of edges. We maintain a 
partition of each list L £ C into chunks satisfying Invariant 13,11 

Invariant 3.1. Let L € C be an Euler tour. //mass(T) < K then L consists of a single chunk. Oth¬ 
erwise L = CoCi • • • C p -1 is partitioned into ©(mass (L)/K) chunks such that mass(C)) € [K,3K] 
for all l € (p\. 

The chunks are partitioned into contiguous sequences of 0(h) superchunks according to Invari¬ 
ant EH For the time being define h = 2\_y/w/2\, where w is the word size. 

Invariant 3.2. A list in C having fewer than h/2 chunks forms a single superchunk with ID J_. A 
list in C with at least h/2 chunks is partitioned into superchunks, each consisting of between h/2 
and h — 1 consecutive chunks. Each such superchunk has a unique ID in [J], where J = Arh/(Kh). 
(IDs are completely arbitrary. They do not encode any information about the order of superchunks 
within a list.) 

Call an Euler tour list short if it consists of fewer than h/2 chunks. We shall assume that no lists 
are ever short, as this simplifies the description of the data structure and its analysis. In particular, 
all superchunks have proper IDs in [J], In Section [B] we sketch the uninteresting complications 
introduced by J_ IDs and short lists. 

3.2 Word Operations 

When h < [y/w\, Invariant 13.21 implies that we can store a matrix A € {0, l} /ixfe in one word that 
represents the adjacency between the chunks within two superchunks i and j. This matrix will 
always be represented in row-major order; rows and columns are indexed by [h] = {0, ..., h — 1 }. 
In this format it is straightforward to insert a new all-zero row above a specified row k (and destroy 
row h — 1) by shifting the old rows k,. .., h — 2 down by one. It is also easy to copy an interval of 
rows from one matrix to another. Lemma 13.31 shows that the corresponding operations on columns 
can also be effected in 0(1) time with a fixed mask p precomputable in 0(logu;) time. 
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Lemma 3.3. Let h = 2|_\/ic/2j and let p be the word (l h 0 h ) h / 2 . Given p we can in 0(1) time 
copy/paste any interval of columns from/to a matrix A € {0, \} hxh ! represented in row-major order. 

Proof. Recall that the rows and columns are indexed by integers in [h] = {0,..., h — 1}. We first 
describe how to build a mask u k for columns k ,..., h — 1 then illustrate how it is used to copy/paste 
intervals of columns. In C notationH the word v' k = (/i » k) Sc p is a mask for the intersection of the 
even rows and columns k,..., h — 1, so u k = v' k \ (z/ » h ) is a mask for columns k through h — 1. 

To insert an all-zero column before column k of A (and delete column h — 1) we first copy 
columns k ,..., h — 2 to A 1 = Ak (I'fc+i << 1) then set A = (A& (~t'fc)) I ( A' » 1). Other operations 
can be effected in 0 ( 1 ) time with copying/pasting intervals of columns, e.g., splitting an array into 
two about a designated column, or merging two arrays having at most h columns together. □ 

3.3 Adjacency Data Structures 

In order to facilitate the efficient implementation of ReplacementEdge we maintain an 0(m/K ) x 
0{m/K) adjacency matrix between chunks, and a J x J adjacency matrix between superchunks. 
However, in order to allow for efficient dynamic updates it is important that these matrices be 
represented in a non-standard format described below. The data structure maintains the following 
information. 

• Each list element maintains a pointer to the chunk containing it. Each chunk maintains a 
pointer to the superchunk containing it, as well as an index in [h] indicating its position 
within the superchunk. Each superchunk maintains its ID in [J] U {X} and a pointer to the 
list containing it. 

• ChAdj is a J x J array of h 2 - bit words ( h 2 < w) indexed by superchunk IDs. The entry 
ChAdj (z, j ) is interpreted as an hx h 0-1 matrix that keeps the adjacency information between 
all pairs of chunks in superchunk i and superchunk j. (It may be that i = j.) In particular, 
ChAdj (i,j)(k, l) = 1 iff there is an edge with endpoints in the A;th chunk of superchunk i and 
the Zth chunk of superchunk j , so ChAdj(i, j) = 0 (i.e., the all-zero matrix) if no edge joins 
superchunks i and j. The matrix ChAdj(z, j) is stored in row-major order. 

• Let 5 be a superchunk with ID(S ) =A. By Invariants 12.21 and 13.21 S is not incident to any 
other superchunks and has fewer than h/2 chunks. We maintain a single word ChAdj 5 which 
stores the adjacency matrix of the chunks within S. 

• For each superchunk with ID i € [J] we keep length-J bit-vectors SupAdjj and Membj, where 

SupAdjj(j) = 1 if ChAdj(z, j) / 0 and 0 otherwise, whereas 
Membj (j) = 1 if j = i and 0 otherwise. 

These vectors are packed into \J/w\ machine words, so scanning one takes 0{\J/w\) time. 

• We maintain a list-sum data structure that allows us to take the bit-wise OR of the SupAdjj 
vectors or Membj vectors, over all superchunks in an Euler tour. It is responsible for main¬ 
taining the {SupAdjj, Membj} vectors described above and supports the following operations. 
At all times the superchunks are partitioned into a set S of disjoint lists of superchunks. Each 
S € S (a list of superchunks) is associated with an I £ £ (an Euler tour), though short lists 
in C have no need for a corresponding list in S. 

5 The operations &, I, and ~ are bit-wise AND, OR, and NOT; << and » are left and right shift. 
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SClNSERr(i) : Retrieve an unused ID, say i' , and allocate a new superchunk with ID i' and 
all-zero vector SupAdjj/. Insert superchunk i! immediately after superchunk i in i’s list 
in S. If no i is given, create a new list in S consisting of superchunk i'. 

SCDelete(z) : Delete superchunk i from its list and make ID i unused. 

SCJoin(5o, Si) : Replace superchunk lists So, Si E S with their concatenation SoS\. 

SCSplit(z) : Let S = SoSi E S and i be the last superchunk in Sq. Replace SoS\ with two 
lists So, Si . 

UpdateAdj(z, x € {0,1} J ) : Set SupAdjj <— x and update SupAdjj(i) <— x(j) for all j / i. 

AdjQuery(5) : Return the vector a E {0,1} J where 

a{j) = V SupAdjj(j) 

«eS 

The index i ranges over the IDs of all superchunks in S. 

MembQuery(S') : Return the vector /? E {0,1} J , where 

P{j) = V Memb *<j) 

*eS 

We use the following implementation of the list-sum data structure. Each list of superchunks 
is maintained as any 0(l)-degree search tree that supports logarithmic time inserts, deletes, splits, 
and joins. Each leaf is a superchunk that stores its two bit-vectors. Each internal node 2 keeps two 
bit-vectors, SupAdj^ and Merab 2 , which are the bit-wise OR of their leaf descendants’ respective 
bit-vectors. Because length-J bit-vectors can be updated in 0(\J/w~ |) time, all “logarithmic time” 
operations on the tree actually take 0(log J • J/w) time. The UpdateAdj(*, x) operation takes 
0(log J • J/w) time to update superchunk i and its 0(log J) ancestors. We then need to update 
the zth bit of potentially every other node in the tree, in O(J) time. Since w = D(logn) = D(log J) 
the cost per UpdateAdj is O(J). The answer to an AdjQuery(S’) or MembQuery(S') is stored 
at the root of the tree on S. 

3.4 Creating and Destroying (Super)Chunks 

There are essentially two causes for the creation and destruction of (super)chunks. The first is in 
response to a Split operation that forces a (super)chunk to be broken up. (The Split may itself 
be instigated by the insertion or deletion of an edge.) The second is to restore Invariants 13.11 and 13.21 
after a Join or Insert or Delete operation. In this section we consider the problem of updating 
the adjacency data structures after four types of operations: (i) splitting a chunk in two, keeping 
both chunks in the same superchunk, (ii) merging two adjacent chunks in the same superchunk, (iii) 
splitting a superchunk along a chunk boundary, and (iv) merging adjacent superchunks. Once we 
have bounds on (i)-(iv), implementing the higher-level operations in the stated bounds is relatively 
straightforward. Note that (i)-(iv) may temporarily violate Invariants 13.11 and 13.21 

Splitting Chunks Suppose we want to split the fcth chunk of superchunk i into two pieces, 
both of which will (at least temporarily) stay within superchunk We first zero-out all bits of 

®Remember that ‘k’ refers to the actual position of the chunk within its superchunk whereas ‘V is an arbitrary ID 
that does not relate to its position within the list. 
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ChAdj (i, *)(&,*) and ChAdj(*,*)(*, k) in 0(J) time. For each j we need to insert an all-zero row 
below row k in ChAdj (i,j) and an all-zero column after column k of ChAdjThis can be done 
in 0(1) time for each j, or O(J) in total; see Lemma 13.31 

In 0(K ) time we scan the edges incident to the new chunks k and k + 1 and update the 
corresponding bits in ChAdj(*,★)(&',*) and ChAdj(*, «)(*, k'), for k! E {k,k + 1}. 

Merging Adjacent Chunks In order to merge chunks k and k + 1 of superchunk i we need to 
replace row k of ChAdj (i,j), for all j, with the bit-wise OR of rows k and k + 1 of ChAdj (i, j), zero 
out row k + 1, then scoot rows k + 2, ■ ■ ■ back one row. A similar transformation is performed on 
columns k and k + 1 of ChAdj (j, i ), which takes 0(1) time per j, by Lemma 13.31 In total the time 
is O(J), independent of K. 

Splitting Superchunks Suppose we want to split superchunk i after its kth chunk. We first call 
SCInsert(z), which allocates an empty superchunk with ID i! and inserts i' after i in its superchunk 
list in S. In 0{J ) time we transfer rows k + 1,... , h— 1 from ChAdj(i, j) to ChAdj(b, j) and transfer 
columns k + 1,..., h — 1 from ChAdj(j, i) to ChAdj (j, i'). By Lemma lT3l this takes 0(1) time per 

3 - 

At this point ChAdj is up-to-date but the list-sum data structure and {SupAdj^} bit-vectors are 
not. We update SupAdjj, SupAdjj/ with calls to UpdateAdj(z, x) and UpdateAdj^', x'). Using 
ChAdj, each bit of x and x' can be generated in constant time. This takes O(J) time. 

Merging Superchunks Let the two adjacent superchunks have IDs i and i!. It is guaranteed 
that they will be merged only if they contain at most h chunks together. In 0(J) time we transfer 
the non-zero rows of ChAdj(b,)) to ChAdj(i,j) and transfer the non-zero columns of ChAdj(j, i') 
to ChAdj(j'O). A call to SCDelete(? / ) deletes superchunk i' from its list in S and retires ID i'. 
We then call UpdateAdj(z, x) with the new incidence vector x. In this case we can generate x in 
0(J/w ) time since it is merely the bit-wise OR of the old vectors SupAdjj and SupAdjj/, with bit 
i' set to zero. Updating the list-sum data structure takes O(J) time. 

3.5 Joining and Splitting Lists 

Once we have routines for splitting and merging adjacent (super)chunks, implementing Join and 
Split on lists in C is much easier. The goal is to restore Invariant 13.11 governing chunk masses and 
Invariant 13.21 on the number of chunks per superchunk. 

Performing Join(Lo, Li). Write Lq = Co,..., C p _i and L\ = Dq ,..., D q _\ as a list of chunks. 
If both Lq and L\ are not short then they have corresponding superchunk lists So, Si € S. Call 
SCJoin(So, Si) to join So, Si in S, in O(J) time. 

Performing Split(x). Suppose x is contained in chunk Ci of L = Co • • • C;_iC/C; + i • • • C p _i. 
We split Ci into two chunks C[C ", and split the superchunk containing C/ along this line. Let 
S be the superchunk list corresponding to L and i be the ID of the superchunk ending at C[. 
We split S using a call to SCSplit(z), which corresponds to splitting L into Lq = Co • • • Ci-\C[ 
and Li = C"C/ + i • • • C p _i. At this point C[ or C" may violate Invariant 13.11 if mass(C() < K 
or mass(C( / ) < K. Furthermore, Invariant 13.21 may be violated if the number of chunks in the 


superchunks containing C[ and C" is too small. We first correct Invariant 13.11 by possibly merging 
and resplitting Ci-\C[ and C"Ci+\ along new boundaries. If the superchunk containing C[ has 
fewer than h/2 chunks, it and the superchunk to its left have strictly between h/2 and 3h/2 
chunks together, and so can be merged (and possibly resplit) into one or two superchunks satisfying 
Invariant 13.21 The same method can correct a violation of C'/'s superchunk. This takes 0(K + J) 
time. 

Performing ReplacementEdge(To, L\) The list-sum data structure makes implementing the 
ReplacementEdge(L 0 , Li) operation easy. Let So and S i be the superchunk lists corresponding 
to Euler tours To and L\. We compute the vectors a <— AdjQuery(S'o) and (3 <— MembQuery(S'i) 
and their bit-wise AND a A /3 with a linear scan of both vectors. If a A /? is the all-zero vector then 
there is no edge between To and L On the other hand, if (a A f3)(j) = 1, then j must be the ID 
of a superchunk in Si that is incident to some superchunk in So- To determine which superchunk 
in So we walk down from the root of So’s list-sum tree to a leaf, say with ID i, in each step moving 
to a child z of the current node for which SupAdj z (j) = 1. Once i and j are known we retrieve any 
1-bit in the matrix ChAdj (z,j), say at position ( k,l ), indicating that the fcth chunk of superchunk 
i and the Zth chunk of superchunk j are adjacent. We scan all its adjacent edges in 0(K ) time and 
retrieve an edge joining To and L\. The total time is 0(J/w + log J + K) = 0(J/w + K ). 

Performing Insert(r, v) If List(k) ^ List(w), first perform 0(1) Splits and Joins to restore 
the Euler tour Invariant 12.21 Now u and v are in the same list in C. Let i,j be the IDs of the 
superchunks containing the principle copies of u and v and let k, l be the positions of u and u’s chunks 
within their respective superchunks. We set ChAdj (i,j)(k,l) <— 1. If ChAdj (i, j) was formerly the 
all-zero matrix, we call UPDATEADj(i, x) to update superchunk V s adjacency information with the 
correct vector xQ Inserting one edge changes the mass of the chunks containing u and v, which 
could violate Invariant [3TT1 Invariants I3T1 and 13.21 are restored by splitting/merging 0(1) chunks 
and superchunks. 

Performing Delete(r, v) Compute as defined above, in 0(1) time. After we delete 

(u,v) the correct value of the bit ChAdj(i, j)(k, l) is uncertain. We scan chunk k or superchunk i 
in 0(K ) time, looking for an edge connected to chunk l of superchunk j. If we do not find such an 
edge we set ChAdj (i,j)(k,l) <— 0, and if that makes ChAdj(z,j) = 0 (the all-zero matrix), we call 
UPDATEADj(i, x), where x is the new adjacency vector of superchunk i\ it only differs from the 
former SupAdjj at position j. 

If (u, v ) is a tree edge in T = ToU{(u, u)}UTi we perform Splits and Joins to replace Euler(T) 
with Euler (To), Euler (Ti), which may violate Invariant 12.21 if there is a replacement edge between 
To and T\. We call REPLACEMENTEDGE(Euler(T 0 ), Euler (Ti)) to find a replacement edge. If one 
is found, say (u, v), we form Euler(To U {(u, D)} U T\) with a constant number of Splits and Joins. 

3.6 Running Time Analysis 

Each operation ultimately involves splitting/merging 0(1) chunks, superchunks, and lists, which 
takes time 0(K + J + logn ■ J/w) = 0(K + J) = 0(K + rh/(Ky/w)). We balance the terms by 

7 Since x only differs from the former SupAdji at position SupAdjJJ), this update to the list-sum tree takes just 
Oflog J) time since it only affects ancestors of leaves i and j. 
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setting K = y -^= so the running time is 0(K). 

By the sparsification transformation of Eppstein, Galil, Italiano, and Nissenzweig [6] this implies 
an update time of Each instance of dynamic connectivity created by [6j has a fixed set 

of vertices, say of size h, and a fixed upper bound m = 0(h) on the number of edges. 


4 Speeding Up the Algorithm 

Observe that there are O ({fh/(Kh)) 2 ) matrices (ChAdj (i, j)) but only rh edges, so for K = \Jrh/h, 
the average h x h matrix has O(h) Is. Thus, storing each such matrix verbatim, using h 2 bits, 
is information theoretically inefficient on average. By storing only the locations of the Is in each 
matrix we can represent each matrix in 0{h log h ) bits on average and thereby hope to solve dynamic 
connectivity faster with a larger ‘h’ parameter. 

The Encoding. In this encoding we index rows and columns by indices in {l,...,h} rather 
than [h]. Let rriij = rrij^ be the number of Is in ChAdj(z,j). We encode ChAdj(i,j) by listing 
its 1 positions in 0(rriijlogh/w) lightly packed words. Each word is partitioned into fields of 
1 + 2 [log {h + 1 )] bits: each field consists of a control bit (normally 0 ), a row index, and a column 
index. Each word is between half-full and full, the fields in use being packed contiguously in the 
word. This invariant allows us to insert a new held after a given held in 0(1) time. We list the Is 
of either ChAdj (i,j) or ChAdj (j,i) = ChAdj (z, j) T in row-major order, with a bit indicating which 
of the two representations is used. 

Fast Operations. Given ChAdj(i,j) in row-major order, we can determine if ChAdj (i,j)(k,l) = 
1 in 0(log((rriijlogh)/w)) = O(logh) time, as follows. By doing a binary search over the first 
held in each word we can determine which word (if any) has a held containing (k, l): the binary 
encoding of ( k,l ). If we add 2 2 r iog ( ft+1 b — (k, l) to each held in the word, the control bits for all 
helds that are equal to or greater than ( k , l) will be hipped to 1. Similarly, if we set all control bits 
to 1 and subtract ( k , /) + 1 from each held, the control bits of helds that are equal to or less than 
( k , l) will be hipped to 0. Thus, we can single out the control bit for an occurrence of ( k , l) (if any) 
with 0(1) bit-wise operations. If ( k,l ) is not present, the control bits reveal the held in the word 
after which it could be inserted, if we need to set ChAdj(i, j)(k, l ) 1. 

In the same time bound we can also identify the positions of the first and last Is in row k. Thus, 
we can perform the following operations on ChAdj(i, j) in 0((rriij logh)/w ) time: setting a row to 
zero, incrementing/decrementing the row-index of some interval of rows, or copying an interval of 
rows. 

The operations sketched above are only efficient if ChAdj (i,j) is in row-major order. If we have 
ChAdj (i,j) T in row-major order we can effect a transpose by (1) swapping the row and column 
indices in each held using masks and shifts, and (2) sorting the helds. In general, sorting x words of 
0(w/log h) helds takes 0(x(log 2 (w/ log h) + log.c log {wj log h))) time using Albers and Hagerup’s 
implementation [2] of Batcher’s bitonic mergesort [2]o We sort each word in 0(log 2 (u;/ log h)) 
time, resulting in x sorted lists, then iteratively merge the two shortest lists until one list remains. 
Merging two lists containing y words takes 0(y log(w/ log h)) time: we can merge the next w/ log h 
helds of each list in 0(\og(w/ log h)) time [ 2 ] and output at least w/\ogh items to the merged list. 

8 Albers and Hagerup also require that the fields to be sorted begin with control bits. 
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Alternatively, if w = log n we can sort and merge lists of e log n / log h fields in unit time 
using table lookup to precomputed tables of size 0(n e ). In this case sorting x packed words takes 
0(x logx) time. 


Splitting and Joining. The cost of splitting and joining (super)chunks is now slightly more 
expensive. When handling superchunk i (or any chunk within it) we first put each ChAdj(i, j) in 
row-major order, in J2j=i log 2 h) = 0(J log 2 h+(Kh/w ) log 3 h) since, by Invariants 13.II 

and 13.21 Y2j m hj = 0[Kh). Once the relevant superchunks are in the correct format, splitting or 
joining 0(1) (super)chunks takes 0(ATlog h + J + ( Kh/w ) log h) time. Since J = 0(rh/(Kh)), the 
overall update time is 


O yK log h + 


rh log 2 h i AW log 3 /i 


Kh 


+ 


w 


Setting h = w and K = \J w ^ gw , the overall time is 0(yj !£). When w = O(logn) the cost of 
taking the transpose is cheaper since sorting and merging a packed word takes unit time via table 
lookup. Setting h = logn, the total time is 


O ( K log log n + 


m 


K log n 


+ AT (log logn)' 


which is 0( /^r )2 : ) wh en K = ^ logn(Ic $ logn) . • 
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A Removing the Bounded Degree Assumption 

Invariants 13.11 and 13.21 imply that there are J = @(m/(Kh )) superchunks with non-_L IDs. However, 
Invariant o cannot be satisfied (as stated) unless the maximum degree is bounded by O(K). One 
way to guarantee this is to physically split up high degree vertices, replacing each v with a cycle 
on new vertices v\,.. . , vrd eg (t;)/ 0 (A')l > each of which is responsible for 0(A") of v’s edges. This is 
the method used by Frederickson m, who actually demanded that the maximum degree be 3 at 
all times! 

This vertex-splitting can be effectively simulated in our algorithm as follows. If deg(u) > K/2, 
replace the principle copy of v in its Euler tour with an interval of artificial principle vertices 
v\,..., v rdeg (v)/(K/ 2 )-\; each of which is responsible for between K/2 and K of v’s edges. Invariant 13.II 
is therefore maintained w.r.t. this modified tour. To keep the mass of artificial vertices between 
K/2 and K, each edge insertion/deletion may require splitting an artificial vertex or merging two 
consecutive artificial vertices. When the Euler tour changes we always preserve the invariant that 
v’s artificial vertices form a contiguous interval in the tour. 

B Dealing with Short Lists 

Until now we have assumed for simplicity that all superchunks have proper IDs in [J], It is 
important that we not give out IDs to short lists (consisting of less than h/2 chunks) because the 
running time of the algorithm is linear in the maximum ID J. The modifications needed to deal 
with short lists are tedious but minor. 

Consider an Insert(u, v) operation where u and v are in lists Lq,L\ and L\ is a short list 
consisting of one superchunk S with ID(S) =_L. If To is not short (or if it is short but the 
combined list LqL\ will not be short) then we retrieve an unused ID, say i, set ID(S) •(— i, set 
ChAdj(i,i) <— ChAdjg, and destroy ChAdjg. By Invariant 12.21 S was not incident to any other 
superchunk, so ChAdj(i, j) = 0 (the all-zero matrix) for all j 7 ^ i. At this point S violates 
Invariant 13.21 (it is too small), so we need to merge it with the last superchunk in Lq and resplit it 
along a different chunk boundary, in O(J) time. 

The modifications to Delete(u,u) are analogous. If we delete a tree edge (u,v), splitting its 
component into To and T\ with associated Euler tours To and L\, and ReplacementEdge(u, v) 
fails to find an edge joining To and L\, we need to check whether Lq (and L\) are short. If so 
let S be the superchunk in Lq. We allocate and set ChAdjg •(— ChAdj(ID(S), ID(S)), then set 
ChAdj(ID(S), ID(S)) «— 0 and finally retire ID(S). 

The implementation of ReplacementEdge(To, Ti) is different if To and L\ were originally 
in a short list T = Euler(T) before a tree edge in T was deleted. Suppose T originally had one 
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superchunk S , whose chunk adjacency was stored in ChAdjg. After 0(1) splits and joins, both 
Lq's chunks and Li’s chunks occupy 0(1) intervals of the rows and columns of ChAdjg. Of course 
ChAdjg is represented as a list of its 1 positions in row-major order, so we can isolate the correct 
intervals of rows and columns in 0(h 2 log 3 h/w) time. If there is any 1 there, say at location 
ChAdj 5 (fc, l), then we know that there is an edge between Lq and L\, and can find it in O(K) time 
be examining chunks k and l. The permutation of rows/columns in ChAdjg must be updated to 
reflect any splits and joins that take place, and if no replacement edge is discovered, ChAdj 5 must 
be split into two lists representing matrices ChAdj 5o and ChAdj^, to be identified with the single 
superchunks Sq and Sj in Lq and Li, respectively. 
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C Summary of Prior Work 


Worst Case Data Structures 


Ref. 

Update Time 

Query Time 

Notes 

m 

0 ( y / m ) 

0(1) 


ibiiiq] 

0 { y / n ) 

0(1) 

[ID] + sparsification [Gj. 

m 

[13] 

0 (c log 5 n) 

0 (c log 4 n) 

0(logn/log log n) 
0(logn/log logn) 

Randomized Monte Carlo; 
no connectivity witness; 
n c opers. err with prob. n~ c . 

new 

r \ 1 /n(loglogn ) 2 \ 

U \\/ logn ) 

0(1) 

w = O(logn) 






Amortized Data Structures 


Ref. 

Amort. Update 

W.C. Query 

Notes 

m 

0 (log 3 n) 

0(logn/log log?i) 

Randomized Las Vegas. 

m 

0 (log 2 n ) 

0(logn/log logn) 

Randomized Las Vegas. 

m 

0 (log 2 n) 

0(logn/log logn) 


m 

0 (log n(log log n) 3 ) 

0 (log n/ log log log n) 

Randomized Las Vegas. 


0 (log 2 n/ log log n ) 

0(logn/log logn) 



Amort./Worst Case Lower Bounds 

Ref. 

Update Time t u 

Query Time t q 

Notes 

uu na ns] 

t q = D(log n/ log(t„ log n)) 


m 

t u = 0(log n/ \og{t q /t u )) 

tq = tt(logn/\og(t u /tq)) 

Implies max{f u ,Q} = Q(logn). 

m 

o(log n) implies 




Table 1: A survey of dynamic connectivity results. The lower bounds hold in the cell probe model 
with word size w = 0(logn). 
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